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DETAILED ACTION 
Response to Arguments 

1 . Applicant's arguments, see amendment, filed 10/29/2004, with respect to the 
rejection(s)of claim(s) 1-26 under 35 USC 102 and 103 have been fully considered and 
are persuasive. Therefore, the rejection has been withdrawn. However, upon further 
consideration, a new ground(s) of rejection is made in view of Taniguchi et al. (US 
6484137) and Gupta et al. (US 6622171). 

Claim Rejections - 35 USC § 102 

1 . The following is a quotation of the appropriate paragraphs of 35 U.S.C. 1 02 that 
form the basis for the rejections under this section made in this Office action: 

A person shall be entitled to a patent unless - (e) the invention was described in (1) an application for 
patent, published under section 122(b), by another filed in the United States before the invention by ' 
the applicant for patent or (2) a patent granted on an application for patent by another filed in the 
United States before the invention by the applicant for patent, except that an international application 
filed under the treaty defined in section 351(a) shall have the effects for purposes of this subsection of 
an application filed in the United States only if the international application designated the United 
States and was published under Article 21(2) of such treaty in the English language. 

2. Claims 12-13 and 19-21 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Taniguchi et al. (US 6484137). 

3. Regarding claim 12, Taniguchi et al. disclose an apparatus containing a data 
structure representing an audio presentation, the data structure comprising a plurality of 
audio channels representing the audio presentation after time scaling, wherein: each 
audio channel has a corresponding time scale factor and includes a plurality of audio 
frames (col. 14, lines 7-67, where "a"=1 and "b"=1/2 or referring to figure 4c for more 
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details)] and each audio frame has a frame index that uniquely distinguishes the audio 
frame from other audio frames in the same channel and identifies the audio frame as 
corresponding to specific audio frames in other audio channels (col. 28, lines 30-65). 

4. Regarding claim 1 3, Taniguchi et al. further disclose the apparatus of claim 1 2, 
wherein audio frames that are in different channels and have the same frame index 
represent the same portion of the audio presentation (co/. 28, lines 30-65). 

5. Regarding claim 19, Taniguchi et al. disclose a method for playing a 
presentation, comprising: loading a first frame from a source into a player via a network, 
the first frame representing a first portion of the presentation after scaling by a first time- 
scaling factor (coL 14, lines 7-64), the first audio frame has a first channel index value 
that identifies the first audio frame as being scaled by the first time scaling factor (col. 
14, lines 7-49 or col. 27, lines 58-65)\ playing the first audio frame to provide the first 
portion of the presentation with the first time scale factor (coL 14, lines 7-64); receiving a 
request to change playing from the first time scaling factor to a second time scaling 
factor (coL 15, lines 3-5); requesting from the source a second audio frame that has a 
second channel index value that identifies the second audio frame as being scaled by 
the second time-scaling factor (coL 15, line 6 to col. 16, line 20)\ and playing the second 
audio frame after the first audio frame to provide a real-time change in the time-scale of 
the presentation (col. 15, line 6 to col. 16, line 20). 
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6. Regarding claims 20-21 , Taniguchi et al. further disclsoe that the first frame has 
a first frame index value that identifies the first portion of the presentation that the first 
audio frame represents, and the second frame has a second index value that identifies 
a second portion of the presentation that the first audio frame represents (col. 14, lines 
7-49 or col. 27, lines 58-65)] and the second index value immediately follows the first 
time index value (coL 14, lines 7-49 or col. 27, lines 58-65). 

7. Claims 1-5, 9-11, 14-18, and 24-25 are rejected under 35 U.S.C. 102(e) as being 
anticipated by Gupta et al. (US 6622171). 

8. Regarding claim 1 , Gupta et al. disclose an apparatus containing a data structure 
representing a presentation, the data structure comprising: a first audio channel 
representing an audio portion of the presentation after time scaling by a first time scale 
factor (referring to figure 8 and/or col. 11, lines 1-67)] and a second audio channel 
representing the audio portion after time scaling by a second time scale factor that 
differs from the first time scale factor (referring to figure 8 and/or col. 11, lines 1-67). 

9. Regarding claim 14, Gupta et al. disclose a method for encoding audio data, 
comprising: performing a plurality of time scaling processes on the audio data to 
generate a plurality of time-scaled audio data sets, each time-scaled audio data set 
having a different time scale factor (figure 8 shows 3 versions of an audio signal being 
scaled by 3 different timing scale factor)] and generating a data structure containing a 
plurality of audio channels respectively corresponding to the plurality of time scaling 
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processes (figure 8), wherein content of each of the audio channels is derived from the 
time-scaled audio data set resulting from performing the corresponding time scaling 
process on the audio data (figure 8). 

10. Regarding claims 2-3, 5, and 9, Gupta et al. further disclose that the first audio 
channel comprises plurality of frames (figure 8, audio signal is processed frame by t 
frame, known in the art)\ the second audio channel comprises plurality of frames that 
are in one-to-one correspondence with the plurality of frames in the first audio channel 
(figure 8 represents 3 versions of an audio signal after being scaled by 3 different time 
scale factor)\ and corresponding frames in the first and second audio channels 
represent the same time interval of the presentation (figure 8, frames in each version of 
scaled audio signals shown in figure 8 are corresponding with each other), and each 
frame in the first audio channel is separately compressed using a first compression 
method (each frame of an audio signal is compressed on the one-by-one basis), and 
wherein the data structure further comprises a data channel identifying graphics 
associated with the audio presentation (coL 11, lines 44-53, selection appropriate 
version of the video stream to combine with audio stream), and wherein the apparatus 
comprises a server connected to a network (figures 8-9). 

1 1 . Regarding claims 10-11, Gupta et al. further disclose that the apparatus 
comprises: data storage in which the data structure is stored (figure 1, particularly 
element 13)\ a decoder connected to receive a data stream, the decoder converting the 
data stream for perceivable presentation (Decoders 108-109 in figure 3); and selection 
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logic coupled to the data storage and capable of selecting a source channel for the data 
stream from among a set of channels including the first audio channel and the second 
audio channel (element 130-140 in figure 4), wherein the apparatus is a standalone 
device that operates on battery power (client device 11 in figure 1). 

12. Regarding claim 4, Gupta et al. further disclose that the data structure further 
comprises a third audio channel representing the audio presentation after time scaling 
by the first time scale factor, wherein each frame in the third audio channel is separately 
compressed using a second compression method (figure 8 shows 3 versions of an 
audio signal that are time scaled by 3 different time scale factor, wherein each represent 
an individual audio channel). 

1 3. Regarding claims 15-18, Gupta et al. further disclose that generating the data 
structure comprises: partitioning each time-scaled audio data set into a plurality of 
frames (figure 3, multimedia data transmitted to the client device in packets or frames)] 
separately compressing each frame to produce compressed frames (since the client 
device 11 includes audio/video decoders, the servers must have compressed the data 
frame by frame before transmitting to the client device)] and collecting the compressed 
frames into the plurality of audio channels, each audio channel having a corresponding 
one of the different time scale factors (figure 8 shows 3 versions of an audio signal 
being scaled by three different time scale factor), wherein all frames resulting from 
partitioning correspond to the same amount of time in the audio data (fixed frame size, 
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known in the art), wherein separately compressing each frame comprises applying a 
plurality of different compression processes to generate a plurality of compressed 
frames from each frame (figure 8 shows 3 time-scaled compression versions of an 
audio signal): and collecting the compressed frames produces audio channels such that 
in each audio channel, all compressed frames in the audio channel have the same time 
scale and compression process (figure 8 shows 3 time-scaled compression versions of 
an audio signal). 

14. Regarding claim 24, Gupta et al. discloses a method for playing an audio 
presentation on a receiver that is connected via a network to a source having a multi- 
channel data structure representing the audio presentation, the method comprising: 
determining available bandwidth on the network (coL 11, lines 44-53); selecting a first 
channel of the multi-channel data structure from a plurality of channels that represent 
the audio presentation after time-scaling by a desired time-scaling factor (coL 11, line 44 
to coL 12, line 67); receiving a first frame from the first channel (coL 11, line 44 to col. 
12, line 67); and playing the first frame (coL 11, line 44 to col. 12, line 67). 

15. Regarding claim 25, Gupta et al. further disclose that determining bandwidth 
available on the network after receiving the first frame (coL 8, line 19 to col. 9, line 67); 
selecting a second channel of the multi-channel data structure from the plurality of 
channels that represent the audio presentation after time-scaling by the desired time- 
scaling factor (col. 11, line 44 to col. 12, line 67), wherein the second channel contains 
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data that is compressed using a second compression process that provides highest 
audio quality at the bandwidth available after receiving the first frame; receiving a 
second frame from the second channel (coL 1 1, line 44 to col. 12, line 67); and playing 
the second frame after playing the first frame (coL 11, line 44 to col. 12, line 67). 

Claim Rejections - 35 USC § 103 

16. The following is a quotation of 35 U.S.C. 103(a) which forms the basis for all 
obviousness rejections set forth in this Office action: 

(a) A patent may not be obtained though the invention is not identically disclosed or described as set 
forth in section 102 of this title, if the differences between the subject matter sought to be patented and 
the prior art are such that the subject matter as a whole would have been obvious at the time the 
invention was made to a person having ordinary skill in the art to which said subject matter pertains. 
Patentability shall not be negatived by the manner in which the invention was made. 

17. Claims 6-8 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Gupta et al. (US 6622171) in view of Taniguchi et al. (US 6484137). 

1 . Regarding claim 6, Gupta et al. fail to disclose that the first audio channel 
comprises plurality of frames, each frame having an index value that identifies a time 
interval of the audio portion that the frame represents; the second audio channel 
comprises plurality of frames, each frame in the second channel having an index value 
that identifies a time interval of the audio portion that the frame represents. However, 
Taniguchi et al. teach that the first audio channel comprises plurality of frames, each 
frame having an index value that identifies a time interval of the audio portion that the 
frame represents (co/. 28, lines 30-65); the second audio channel comprises plurality of 
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frames, each frame in the second channel having an index value that identifies a time 
interval of the audio portion that the frame represents (coL 28, lines 30-65). 

Since Gupta et al. and Taniguchi et al. are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Gupta et al. by incorporating the teaching of 
Taniguchi et al. in order to determine data blocks tedsrsubjected to time-scale 
modification to th«t-minimize audio distortion. 

18. Regarding claims 7-8, Gupta et al. further disclose that each frame in the first 
and second data channels is separately compressed (figure 8 show 3 versions of an 
audio stream being encoded by 3 different time scale factors), and wherein the data 
structure further comprises a data channel corresponding to a plurality of bookmarks, 
wherein each bookmark has index value and identifies graphics, the index value 
indicating a display time for the graphics relative to playing of the frames of the first or 
second audio channel (coL 4, lines 1-20). 

19. Claims 22-23 are rejected under 35 U.S.C. 103(a) as being unpatentable over 
Taniguchi et al. (US 6484137) in view of Gupta et al. (US 6622171). 

20. Regarding claim 22, Taniguchi et al. further disclose that the channel index 
values of frames further indicate respective compression processes for the frames, and 
wherein the method further comprises: selecting the second channel index value (coL 
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28, lines 51-65) from a plurality of channel index values that identify the second time 
scaling factor {col. 15, line 3 to col. 16, line 19). 

Taniguchi et al. fail to disclose the steps of determining available bandwidth on 
the network, and wherein the second channel index indicates a compression process 
provides highest audio quality at the available bandwidth. However, Gupta et al. teach 
the steps of determining available bandwidth on the network (col. 1 1, lines 44-53), and 
wherein the second channel index indicates a compression process that provides 
highest audio quality at the available bandwidth (col. 11, line 44 to col. 12, line 67). 

Since Taniguchi et al. and Gupta et al are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Taniguchi et al. by incorporating the teaching of 
Gupta et al. in order to determine data blocks subjected to time-scale modification 
to minimize audio distortion. 

21 . Regarding claim 23, Taniguchi et al. further disclose that the channel index 
values of frames further indicate respective compression processes for the frames, and 
wherein the method further comprises: selecting a third channel index value from a 
plurality of channel index values that identify the second time scaling factor (coi 14, line 
7 to col. 16, line 19), requesting from the source a third audio frame that has the third 
channel index value, which identifies the third audio frame as being time-scaled by the 
second time-scaling factor (each frame in the frame sequence in coi 12, lines 12-13 
has a different time-scaling factor, e.g. "a"=1 and "b"=1/2, for more detail, referring to 
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figure 4c)\ and playing the third frame after the second frame (co/. 14, line 7 to col. 16, 
line 19). 

Taniguchi et al. fail to disclose the steps of determining available bandwidth on 
the network, wherein the third channel index indicates a compression process that 
provides highest audio quality at the available bandwidth. However, Gupta et al. teach 
the steps of determining available bandwidth on the network {col. 11, lines 44-53), and 
wherein the third channel index indicates a compression process that provides highest 
audio quality at the available bandwidth (col. 11, line 44 to col. 12, line 67). 

Since Taniguchi et al. and Gupta et al are analogous art because they are from 
the same field of endeavors, it would have been obvious to one of ordinary skill in the 
art at the time of invention to modify Taniguchi et al. by incorporating the teaching of 
Gupta et al. in order to determine data blocks4ai»rsubjected to time-scale modification 
totefcminimize audio distortion. 

Conclusion 

Any inquiry concerning this communication or earlier communications from the 
examiner should be directed to Huyen Vo whose telephone number is 703-305-8665. 
The examiner can normally be reached on M-F, 9-5:30. 

If attempts to reach the examiner by telephone are unsuccessful, the examiner's 
supervisor, Doris To can be reached on 703-305-4827. The fax phone number for the 
organization where this application or proceeding is assigned is 703-872-9306. 
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Information regarding the status of an application may be obtained from the 
Patent Application Information Retrieval (PAIR) system. Status information for 
published applications may be obtained from either Private PAIR or Public PAIR. 
Status information for unpublished applications is available through Private PAIR only. 
For more information about the PAIR system, see http://pair-direct.uspto.gov. Should 
you have questions on access to the Private PAIR system, contact the Electronic 
Business Center (EBC) at 866-217-9197 (toll-free). 

Huyen X. Vo February 1 5, 2005 
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